透過 R 轉換 ND JSON 格式上傳至 Elastic Search

第 12 屆 iThome 鐵人賽

DAY 29

Elastic Stack on Cloud

Elastic 30天自我修行系列第 29 篇

12th鐵人賽

bear999

2020-09-29 23:32:10

1561 瀏覽

分享至

倒數第2天了，今天要來試著透過 R 的 jsonlite 套件轉換為 ElasticSearch 可以接受的 ND JSON 格式。

其實最好的解法還是透過 linux 下安裝的 [jq] (https://stedolan.github.io/jq/)

sudo apt install jq

但 jq 的官網說明書的範例，實在有點難懂；倒數第2天了，實在不想這樣放棄掉，所以還是介紹一下 data.frame 轉為 ND JSON 格式的手法。

library(jsonlite)
library(elastic)

# stream_out(penguins)

stream_out( penguins ,
       con= file("/home/temp123/nd.json"),
       prefix =  '{ "index": { "_index" : "test4" } }\n'  )


x <- connect(
  host = "555.japaneast.azure.elastic-cloud.com",
  path = "",
  user = "elastic",
  pwd = "555",
  port = 9243,
  transport_schema = "https"
) 

docs_bulk(x, "/home/temp123/nd.json"  )

重點在於 stream_out 這個函數；可以幫你把 dataframe 的格式，每筆紀錄都用斷行隔開:

{"species":"Chinstrap","island":"Dream","bill_length_mm":49.6,"bill_depth_mm":18.2,"flipper_length_mm":193,"body_mass_g":3775,"sex":"male","year":2009}
{"species":"Chinstrap","island":"Dream","bill_length_mm":50.8,"bill_depth_mm":19,"flipper_length_mm":210,"body_mass_g":4100,"sex":"male","year":2009}

然而 ElasticSeaech 要求的格式是

action_and_meta_data\n
optional_source\n

就像官網中的範例

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }

所以我們必需在每一個要上傳的文件中，加上"action_and_meta_data"；
也就得要在 stream_out 中，設定 prefix 參數。

此處我就將 prefix 簡單設置成 index 為 "test4"

{ "index": { "_index" : "test4" } }\n

如此一來，就等於我在 JSON 中設定好指定上傳的 index，因此在 docs_bulk 中，就無需指定 index。

最後第2天，在晚上11:30分驚險的度過了...
果然我對於 JSON 格式的處理能力，還是得再進修啊..... 是需要找個時間把 jq 的用法精練一下了。

明天就是最後一天啦，預計來寫個畢業心得文。

讀同組鐵人賽參賽者文章的心得筆記#2-聚合操作

學習 ElasticSearch 的第三十天心得

系列文

Elastic 30天自我修行共 31 篇

RSS系列文訂閱系列文

11 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

Elastic 30天自我修行系列 第 29 篇